Method and apparatus for motion compensated temporal interpolation of video sequences

ABSTRACT

A method and device for processing video signals is provided, comprising the steps of: selecting a motion vector having the least mean square error from an M×N block, applying a vector filter to the motion vector, and if filter output MSE is below a present threshold select the filtered motion vector as output filter, and if above the threshold, select input vector as output filter. Perform a P×Q nearest neighbor motion vector substitution on the output motion vector field, where P is less than M and Q is less than N. Perform an S×T nearest neighbor substitution on the output motion vector field, where S is less than P and T is less than Q.

This application claims the benefit under 35 U.S.C. § 365 ofInternational Application PCT/US01/26514 filed Aug. 27, 2001, whichclaims the benefit of U.S. Provisional Application No. 60/228,362, filedAug. 28, 2000.

BACKGROUND OF THE INVENTION

1. Technical Field

The invention relates generally to the processing of video signals; andmore particularly to the interpolation of temporally consecutive videoframes.

2. Description of the Prior Art

Block-based minimum mean-squared error (MSE) motion estimationtechniques used to interpolate motion between two temporally-consecutiveframes of a video sequence, provide the best match for each block, butdo not necessarily provide motion vectors which represent true motion.Previously proposed motion compensated temporal interpolation (MCTI)algorithms have used modified motion estimation methods for estimatingmotion to ultimately display a reasonable reconstructed version of theimage transmitted. When extraction of the motion is not precise, theimage reconstructed will not be smooth and will include discontinuities.Therefore, there exists a need for an improved scheme for motioncompensated interpolation of video sequences between temporally-relatedvideo frames.

SUMMARY OF THE INVENTION

The instant invention is a method for motion compensated temporalinterpolation of video frames using filter motion-vector fields. Themethod encompasses fitering the motion-vector field produced by astandard block-based minimal mean-squared error (MSE) motion estimator.Upon filtering, MPEG-encoded motion vectors can be used forinterpolation and use of a specialized motion compensated temporalinterpolation (MCTI) estimator is not necessary.

In one exemplary embodiment of the invention, filtering of receivedmotion vector fields is accomplished by first processing the motionvector field at block size 32×32. For each 32×32 block, compute the MSEfor each of the received motion vectors on the block (or a subsetthereof) and select a motion vector which minimizes the MSE for the32×32 block. Second, process the motion vector field at block size16×16, apply a nonlinear vector filter to the motion vector field, andthen perform 16×16 nearest-neighbor motion vector substitution. Third,process the motion vector field at block size 8×8 and apply 8×8nearest-neighbor motion vector substitution.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a flow diagram of the filtering method according to apreferred embodiment of the present invention.

FIG. 2 shows a plot of the motion vector field of the video data asreceived prior to processing according to the present invention;

FIG. 3 shows a plot of the motion vector field after processingaccording to the present invention.

DETAILED DESCRIPTION

The invention is directed to a process for increasing the frame rate ofa video sequence by interpolating new frames betweentemporally-consecutive frames of the original sequence. The temporalinterpolation is performed along motion trajectories using motionvectors between each pair of temporally-consecutive original frames.

One exemplary embodiment of the process according to the principles ofthe present invention is implementable with a computer having a PentiumIV or equivalent class processor, RAM and ROM memory for storing codesexecutable by the processor to perform the process according to theinvention.

Referring to FIG. 1, image data coded into blocks is received, theblocks having motion vectors (step 100). The motion vector field at32×32 pixel block size is processed. For each 32×32 block, compute themean-squared error (MSE) using each of the received motion vectorswithin that 32×32 block (step 110). The motion vector having the leastMSE is selected (step 120), and it is tested against a predeterminedthreshold (step 130). If the minimum MSE is greater than the presentthreshold value, there is no match and the block is marked as having nomatch (step 140). Otherwise, the motion vector minimizing the MSE isselected as the motion vector for the 32×32 block.

The chosen motion vector is then further localized in succeedinglysmaller blocks. A vector filter is applied to the chosen motion vector.The filter is preferably nonlinear, such as the RE filter described in(step 150) “Ranking in R^(P) and its use in multivariable imageestimation”, Hardie and Arce, IEEE Transaction on Circuits and Systemsfor Video Technology, Vol. 1, No. 2, pp. 197–209, June 1991. Applicationof the nonlinear filter removes the more noisy components of the motionvector data. After the filter has been applied, the filtered motionvector is tested against for each 16×16 block to determine whether thefiltered output motion vector can be either accepted or replaced by theinput motion vector to the filter for that block. This is because thefilter is applied to the motion vector field without any considerationof the pixel data of the frames being interpolated. For a given 16×16block let the MSE obtained using the input to the filter be MSE_(in) andlet the MSE obtained using the output of the filter be MSE_(out). Thenif MSE_(out)<16×MSE_(in) then the filtered output motion vector isselected for the block. Otherwise the input motion vector is retainedfor the block (step 160). Then, 16×16 nearest neighbor substitution isperformed. For each 16×16 block, the current motion vector for thatblock is tested against the motion vectors of the four nearestneighboring 16×16 blocks. The motion vector which minimizes the MSE forthe 16×16 block is selected as the output of the block group (step 170).As at the 32×32 level, if the minimum MSE is above a threshold value,the block is marked as having no match.

The nearest-neighbor substitution process is then applied to 8×8 blocks.For each 8×8 block the current motion vector for that block is testedagainst the motion vectors of the four nearest neighboring 8×8 blocks.The motion vector which minimizes the MSE for the 8×8 block is selectedas the output for that block group (step 180). Again, the block ismarked as having no match if the minimum MSE is above a threshold value.

According to this embodiment of the invention, the output of the 8×8nearest neighbor substitution process is the motion vector field whichis used to interpolate between the two input frames (step 190).

Generally, the method encompassed in the principles of the presentinvention may be summarized as comprising the following generalizedsteps: selecting a motion vector having the least mean square error froman M×N block, applying a vector filter to the motion vector, and iffilter output MSE is below a preset threshold select the filtered motionvector as output filter, and if above the threshold, select input vectoras output filter. Perform a P×Q nearest neighbor motion vectorsubstitution on the output motion vector field, where P is less than Mand Q is less than N. Perform an S×T nearest neighbor substitution onthe output motion vector field, where S is less than P and T is lessthan Q.

FIG. 2 shows the motion vector field prior to processing, in accordancewith the principles of the present invention.

FIG. 3 shows the motion vector field after processing, in accordancewith the principles of the present invention. It can be seen that themotion vectors appear in random directions prior to processing and arein a substantially common direction after the processing.Advantageously, the processing of the video data according to thepresent invention extracts the motion vectors having true motion.Interpolation of the true motion vectors results in a smoothreconstructed image.

A representative listing including coding instructions for aninterpolation process is shown as follows:

For a received bit stream having progressive material, M=1, regular GOPstructure, and spatial resolution J×K, the received bitstream havingframe structure P₁, P₃, P₅, P₇, interpolation is performed half waybetween two frames to insert a frame between each frame pair. As anexample, interpolation of the P₃ and P₁ frames to form interpolatedframe B₂:

-   -   P₃ picture having J/16×K/16 field of motion vectors. M₃ (x,y)

Filter M₃ (x,y) using the process described above to obtain new motionvector field m₃ (x,y). Use m₃ (x,y) to compute frame B₂.

For χ=0, . . . , J/16−1; y=0, . . . K/16−1

-   -   χ′=16χ+dχ/4,y′=16y+dy/4 where    -   (dχ, dy)=m₃(x,y)        For i=0, . . . 15; j=0, . . . 15    -   Compute B₂ (χ′+i, y′+j)

$\begin{matrix}{{B_{2}\left( {{\chi + j},{y^{\prime} + j}} \right)} = \left\lbrack {{a_{1}{P_{1}\left( {{\chi^{\prime} + j + {{dx}/4}},{y^{\prime} + i + {{dy}/4}}} \right)}} +} \right.} \\{{b_{1}{P_{1}\left( {{\chi^{\prime} + j + {d\;{\chi/4}} + 1},{y^{\prime} + i + {{dy}/4}}} \right)}} +} \\{{c_{1}{P_{1}\left( {{\chi^{\prime} + j + {{dx}/4}},{y^{\prime} + i + {{dy}/4} + 1}} \right)}} +} \\{{d_{1}{P_{1}\left( {{\chi^{\prime} + j + {{dx}/4} + 1},{y^{\prime} + i + {{dy}/4} + 1}} \right)}} +} \\{{a_{3}{P_{3}\left( {{\chi^{\prime} + j - {{dx}/4} - 1},{y^{\prime} + i - {{dy}/4} - 1}} \right)}} +} \\{{b_{3}{P_{3}\left( {{\chi^{\prime} + j - {{dx}/4}},{y^{\prime} + i - {{dy}/4} - 1}} \right)}} +} \\{{c_{3}{P_{3}\left( {{\chi^{\prime} + j - {{dx}/4} - 1},{y^{\prime} + 1 - {{dy}/4}}} \right)}} +} \\{\left. {a_{3}{P_{3}\left( {{\chi^{\prime} + j - {{dx}/4}},{y^{\prime} + i},{{dy}/4}} \right)}} \right\rbrack/16}\end{matrix}$

-   a₁=(4−dx % 4)*(4−dy % 4)-   b₁=(dx % 4)*(4−dy % 4)-   c₁=(4−dx % 4)*(dy % 4)-   d₁=(dx % 4)*(dy % 4)-   a₃=(dx % 4)*(dy % 4)=d₁-   b₃=(4−dx % 4)*(dy % 4)=c₁-   c₃=(dx % 4)*(4−dy % 4)=b₁-   d₃=(4−dx % 4)*(4−dy % 4)=a₁

If B₂ (χ′+i, y′+j) has already been computed, keep a running average, orkeep all values computed and average, or do not replace old value withnew value, or scrap old and new values and use zero motion vector.

If pixel (χ_(o)y_(o)) has not been visited:

If (χ_(o)y_(o)) was in an intro-coded macroblock in P₃ AND P₁(χ_(o)y_(o)) was used for predictionB₂(χ_(o)y_(o))=P₃(χ_(o)y_(o))

Else if (χ_(o)y_(o)) was in an intra-coded MB in P₃ AND P₁(χ_(o)y_(o))was not used for predictionB₂(χ_(o)y_(o))=(P₁(χ_(o)y_(o))+P₃(χ_(o)y_(o)))/2

Else if (χ_(o)y_(o)) was not in an intra-coded MB in P₃ andP₁(χ_(o)y_(o)) was used for prediction

Use spatial interpolation forB₂(χ_(o)y_(o))ElseB₂(χ_(o)y_(o))=P₁(χ_(o)y_(o))Nearest-neighbor MV Substitution:

For each block:

-   -   if not intra, compute MSE using current motion vector    -   for each non-intra neighbor, use neighbor's motion vector to        compute MSE    -   Find minimum MSE    -   if (minimum MSE<threshold)        -   Select corresponding motion vector else        -   block is intra

Having described embodiments of the above invention, it is noted thatmodifications and variation can be made by persons skilled in the art inlight of the above teachings. It is therefore to be understood thatchanges may be made in the particular embodiments of the inventiondisclosed which are within the scope and spirit of the invention asdefined by the appended claims.

1. A method of processing video signals in the form of motion vectorsgenerated by motion estimation, the method characterized by the stepsof: selecting a motion vector having the least mean square error from ablock of size M×N; applying a vector filter to the motion vector, iffilter output MSE is below a preset threshold select the filtered motionvector as output filter, and if above the threshold, select input vectoras output filter; performing a P×Q nearest neighbor motion vectorsubstitution on the output motion vector field, where P is less than Mand Q is less than N; and performing an S×T nearest neighborsubstitution on the output motion vector field, where S is less than Pand T is less than Q.
 2. The method according to claim 1, wherein thestep of applying a vector filter is performed using a nonlinear filter.3. The method according to claim 2, wherein the nonlinear filter is theR_(E) filter.
 4. The method according to claim 1, wherein the presetthreshold is sixteen times MSE of the input vector.
 5. The methodaccording to claim 1, wherein the step of performing P×Q nearestneighbor substitution includes comparing the output motion vectoragainst motion vectors of neighboring P×Q blocks and outputting themotion vector having the smallest MSE.
 6. The method according to claim1, wherein the step of performing S×T nearest neighbor substitutionincludes comparing the selected motion vector against motion vectors ofneighboring S×T blocks and outputting the motion vector having thesmallest MSE.
 7. The method according to claim 1, wherein M×N is 32×32,P×Q is 16×16 and S×T is 8×8.
 8. An apparatus for processing videosignals in the form of motion vectors generated by motion estimation,the apparatus characterized by: means for selecting a motion vectorhaving the least mean square error from a block of size M×N; means forapplying a vector filter to the motion vector, if filter output MSE isbelow a preset threshold select the filtered motion vector as outputfilter, and if above the threshold, select input vector as outputfilter; means for performing a P×Q nearest neighbor motion vectorsubstitution on the output motion vector field, where P is less than Mand Q is less than N; and means for performing an S×T nearest neighborsubstitution on the output motion vector field, where S is less than Pand T is less than Q.
 9. The apparatus according to claim 8, wherein thestep of applying a vector filter is performed using a nonlinear filter.10. The apparatus according to claim 9, wherein the nonlinear filter isthe R_(E) filter.
 11. The apparatus according to claim 8, wherein thepreset threshold is sixteen times MSE of the input vector.
 12. Theapparatus according to claim 8, wherein the step of performing P×Qnearest neighbor substitution includes comparing the output motionvector against motion vectors of neighboring P×Q blocks and outputtingthe motion vector having the smallest MSE.
 13. The apparatus accordingto claim 8, wherein the step of performing S×T nearest neighborsubstitution includes comparing the selected motion vector againstmotion vectors of neighboring S×T blocks and outputting the motionvector having the smallest MSE.
 14. The apparatus according to claim 8,wherein M×N is 32×32, P×Q is 16×16 and S×T is 8×8.